Maximum-Minimum Similarity Training for Text Extraction

نویسندگان

  • Hui Fu
  • Xiabi Liu
  • Yunde Jia
چکیده

This paper proposes a maximum-minimum similarity training algorithm to optimize the parameters in the effective method of text extraction based on Gaussian mixture modeling of neighbor characters. The maximum-minimum similarity training (MMS) methods optimize recognizer performance through maximizing the similarities of positive samples and minimizing the similarities of negative samples. Based on this approach to discriminative training, we defined the objective function for text extraction, and used the gradient descent method to search the minimum of the objective function and the optimum parameters for the text extraction method. The experimental results of text extraction show the effectiveness of MMS training in text extraction. Compared with the maximum likelihood estimation of parameters from Expectation Maximization (EM) algorithm, the training results after MMS made the performance of text extraction improved greatly. The recall rate of 98.55% and the precision rate of 93.56% were achieved. The experimental results also show that the maximum-minimum similarity (MMS) training behaved better than the commonly used discriminative training of the minimum classification error (MCE).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms For Training Information Extraction Systems

Information Extraction systems offer a way of automating the discovery of information from text documents. Research and commercial systems use considerable training data to learn dictionaries and patterns to use for extraction. Learning to extract useful information from text data using only minutes of user time means that we need to leverage unlabeled data to accompany the small amount of labe...

متن کامل

Automatic Segmentation of Clinical Texts - Preliminary Results

Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text, and knowledge of their structure is necessary for enhancing the productivity of the healthcare departments and facilitating research. This paper presents a preliminary study attempting to automatically segment medical reports i...

متن کامل

Distributional Semantic Models for Clinical Text Applied to Health Record Summarization

As information systems in the health sector are becoming increasingly computerized, large amounts of care-related information are being stored electronically. In hospitals clinicians continuously document treatment and care given to patients in electronic health record (EHR) systems. Much of the information being documented is in the form of clinical notes, or narratives, containing primarily u...

متن کامل

Text Mining

“Bag of words” model, acronym extraction, authorship ascription, coordinate matching, data mining, document clustering, document frequency, document retrieval, document similarity metrics, entity extraction, hidden Markov models, hubs and authorities, information extraction, information retrieval, key-phrase assignment, key-phrase extraction, knowledge engineering, language identification, link...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006